Advanced Statistical Analysis of Fitness Data for Predictive Insights & Personalization
Research Project (SMST604)
Purva Amit Puranik(3032411004)
DES Pune University — Dept. of Statistics
Omkar Nilesh Ninav(3532411003)
DES Pune University — Dept. of Statistics
2025-10-14
Motivation Behind the Project
Fitness data reflecting exercise patterns, body composition, and lifestyle factors offer valuable opportunities for quantitative analysis.
Understanding how these factors interact can help reveal key determinants of progress and explain variations in individual performance.
The project focuses on building statistical and predictive models to detect performance plateaus and forecast outcomes for fitness improvement.
Metrics such as the Resilience Index and Progress Ratio aim to make analytics more interpretable and actionable for individual users.
Proposed Research Questions
To what extent does consistent workout completion lead to a significant improvement in body fat percentage, e.g., meeting weekly or monthly targets?
Can a Bayesian Structural Time Series (BSTS) model accurately forecast stagnation in key physiological metrics (e.g., body fat %, BMI) at least one week in advance?
What distinct client clusters emerge from an unsupervised analysis of longitudinal activity and progress data (e.g., Fast Responders, At-Risk Plateauers)?
How can Resilience Index and Progress Ratio be quantitatively defined and operationalized, and do these metrics correlate with long-term user success and engagement?
Technical Details and Domain Knowledge
Interdisciplinary Scope
Combines statistics, exercise physiology, and wearable technology.
Uses synthetically generated data designed to reflect real-world exercise and lifestyle patterns.
Bridges the gap between data science and human performance understanding.
Technical Details and Domain Knowledge
Personalization and Adaptation
Applies clustering to identify groups with similar fitness responses.
Uses adaptive models that update as new data are generated.
Enables personalized progress tracking and interpretable metrics such as the Resilience Index and Progress Ratio.
Statistical Methodologies
General Analysis
Descriptive statistics to summarize fitness metrics (e.g., mean, variance, correlation).
Inferential tests to examine the impact of workout consistency on body fat %.
Regression modeling to quantify relationships between input features and outcomes.
Statistical Methodologies
Predictive Modeling
Bayesian Structural Time Series (BSTS) to forecast stagnation in metrics like body fat % or BMI.
Model validation through posterior predictive checks and forecast accuracy measures (e.g., MAE, RMSE).
Helps in early detection of plateaus and performance forecasting.
Statistical Methodologies
Unsupervised Learning
Clustering algorithms (e.g., K-Means, Hierarchical, or DBSCAN) to segment users.
Identify behavioral groups such as Fast Responders or At-Risk Plateauers.
Use Principal Component Analysis (PCA) for dimensionality reduction and visualization.
Statistical Methodologies
Custom Metrics and Correlation
Define Resilience Index (recovery and adaptability measure) and Progress Ratio (rate of improvement).
Validate these metrics using correlation and regression analysis against long-term success indicators.
Proposed Analytical Pipeline & Flow
The analytical workflow integrates data preprocessing, sampling, and four core research questions (RQ1–RQ4).
Covers causal inference, time-series forecasting, clustering, and personalized metric modeling.
RQ1 – Causal Effect of Workout Consistency
Estimate effect of workout consistency on body fat % using Marginal Structural Models (MSM).
Apply Inverse Probability of Treatment Weighting (IPTW) for bias adjustment.
Validate robustness through sensitivity and model diagnostics.
Proposed Analytical Pipeline & Flow
RQ2 – Plateau Forecasting
Fit Bayesian Structural Time Series (BSTS) models to forecast stagnation in fat %.
Evaluate forecast accuracy using posterior probabilities, precision, recall, and calibration curves.
Proposed Analytical Pipeline & Flow
RQ3 – User Segmentation
Perform Principal Component Analysis (PCA) for dimensionality reduction.
Identify clusters using K-Means and Bayesian Gaussian Mixture Models (BGMM).
Validate cluster stability and interpret behavioral–physiological profiles.
Proposed Analytical Pipeline & Flow
RQ4 – Personalized Metrics
Compute Progress Ratio and Resilience Index from user trajectories.
Assess associations using OLS regression and Cox survival models.
Apply SHAP values for interpretability and insight into feature importance.